Team Braavos - Predicting and Visualizing College Admissions

banner

Overview and Motivation

Every year, four million students apply to US Colleges without having a good idea of their chances of getting in. Fueled by US News Rankings, colleges puff up their rejection rates, while myopically finding students through the narrow lens of standardized test scores. This project provides data-science based probabilities of getting in along with interactive visualizations, allowing students and parents to investigate how certain aspects of their application affect the chances of an acceptance to various schools and therefore allowing them to best focus their time and money. Users can also view summaries of application and acceptance data to make other inferences about their own potential success in applying to different colleges. Finally, detailed visualizations are provided to compare specific schools by factors that motivate their acceptance criteria.

Team and Roles

Team Braavos consists of:

  • Marina Adario
  • Dion Hagan
  • Malcolm Mason Rodriguez
  • David Wihl

We operated as a fully agile team without pre-defined roles. Anyone could submit, check-in or work on any story.

The team maintained a Kanban board via Trello. The "To Do" column is a regularly triaged and sorted list of the next task. Once an individual completes a task, he or she grabbed the next item off the top of the list.

Communication Rules:

We collaborated regularly during the week via Slack. Email usage was minimal as necessary. Electonic signatures sufficed when signatures are required for submission.

Physical meetings occurred once per week on Wednesday either during or just after Studio as necessary.

Collaboration Policy

The entire project is version controlled through github. Any non-trivial story had a separate branch, with regular check-ins and merges. Each story was short, typically no longer than two days. Stories that are blocked were indicated as such in Trello via a red bar label.

Elevator Pitch

“Every year, four million students apply to US Colleges without having a good idea of their chances of getting in. Fueled by US News Rankings, colleges puff up their rejection rates, while myopically finding students through the narrow lens of standardized test scores. ChanceMe provides data-science based probabilities of getting in along with interactive visualizations, allowing students and parents to investigate how certain aspects of their application affect the chances of an acceptance to various schools and therefore allowing them to best focus their time and money. Users can also view summaries of application and acceptance data to make other inferences about their own potential success in applying to different colleges. We then aim to disrupt the entire application process so quality students can be matched to the right school and schools can fulfill their desired mix of students.”

Intended Audience

The primary intended audience is high school students applying to college. For this audience, we had to make the visualizations intuitive and highly interactive. A secondary audience are the parents of these students who wish to dig deeper into factors of acceptance so they can plan the application process.

Questions

Question 1: Likelihood of FIrst Choice Acceptance

The first visualization on the first page is intended to answer the most pressing question of the high school student: what are my chances of acceptance at my first choice school?

Question 2: Likelihood of Alternate Choice Acceptance

Given the vast majority of applicants apply to multiple schools, it would be very useful to know the likelihood of getting into alternative schools. This also helps a student prioritize backup choices. By definition, a backup school must have a higher probability of acceptance than a primary school. If two backups are equally desired, the applicant should prioritize applying to the school with greater probability of acceptance.

Question 3: What Factors Determine Acceptance

Our second visualization permits drilling down into the details of acceptance to determine both on a per-factor and on a per-school basis which factors are most important to a given school. This is helpful in long term planning by a parent, guidance counselor or mature student as to the importance of re-taking standardized tests, focusing on GPA, choosing the quantity of AP exams or spending significant time on extracurriculars.

Feature List

To address these questions, we created visualizations consisting of two pages.

Page 1 - Predictions

This will be the home page of the site. There will be two areas for data entry:

  • Area 1 - demographic data that cannot be changed, including gender, US citizenship, first to attend college, etc.
  • Area 2 - college admission factors that may vary prior to college application such as GPA, standardized test scores, number of AP exams taken.

As the applicant changes values in area 2, the visualization updates. The intent is to be highly interactive, almost game-like, encourgaging the applicant to attempt many different scenarios. Scenarios can compared allowing applicants to try different combinations of factors.

Page 2 - Data Drill Down

The second page has several linked visualizations that allow the user to drill down into the specific factors that weigh into the college acceptance process. This allows the applicant to compare and contrast different schools as well as plan time for the most appropriate activities to maximum college acceptance.

Summary of Features

  • Demographic data entry
  • Admission factors data entry
  • School selection by Ivy / Non-Ivy / Geographic Region
  • Sorting results by school or best probability
  • Acceptance Probability Visualization
  • Drill down visualizations
  • College comparison visualizations via Heat Map
  • Factor comparison with subsetting by school or candidate
  • 3D Visualization of Prediction Results

Project Storyboard

In [1]:
from IPython.display import Image
Image(filename='img/Final_project_story_board.png')
Out[1]:

Tasks and Timeline

(This is for a general idea only. The reference tasks and timeline are in Trello.)

Target:
  • ✔ Choose domain (completed 3/28)
  • ✔ Define question (completed 3/28)
  • ✔ Explore existing solutions (completed 3/28)
  • Formulate data analysis tasks (no longer needed)
Data Wrangling:
  • ✔ Find and clean data (completed 3/28)
  • ✔ Exploratory Data Analysis (completed 3/28)
  • ✔ Transform and summarize data (completed 3/28)
Design
  • ✔ Design Visual Encoding (completed 4/25)
  • ✔ Design Interaction - Prediction (completed 4/4)
  • ✔ Design layout and storytelling - Prediction (completed 4/4)
  • ✔ Design Interaction - Drill down (completed 4/25)
  • ✔ Perform 'paper' user testing (completed in studio 4/6)
Implement
  • ✔ Rapid prototype - Drill down (initial prototype complete 4/11)
  • ✔ Design system architecture (completed 4/11)
  • ✔ Rapid prototype - Prediction (due 4/18)
  • ✔ Innovative visualization: Distorted map of acceptance distance (completed 5/2)
  • ✔ Linked visualizations (completed 4/25)
  • Define Data Structures (no longer needed)
Evaluate
  • ✔ Perform user testing with prototype (completed in studio and elsewhere 4/20)
  • ✔ Is the abstraction right? (completed 4/20)
  • ✔ Does encoding and interaction support the task? (completed 4/21)
  • ✔ Does encoding and interaction provide new insights? (completed 4/21)
Deliverables
  • ✔ Process Book (completed 5/2)
  • ✔ Screencast (completed 5/2)
  • Demos / design fair (due 5/4)
Time Permitting / Nice to Have
  • Performance optimizations:
    • Determine bottlenecks and explore efficient algorithms
    • Random Forest in JavaScript (webservice performance significantly improved)
    • ✔ cache the CSV (collegelist and the data) in localStorage (completed 4/4)
    • train the Random Forest asynchonously
    • ✔ populate list of colleges from CSV (or cache) instead of hard coded (completed 5/1)
In [ ]: